Attention Detection by Heartbeat and Respiratory Features from Radio-Frequency Sensor

This work presents a study on users’ attention detection with reference to a relaxed inattentive state using an over-the-clothes radio-frequency (RF) sensor. This sensor couples strongly to the internal heart, lung, and diaphragm motion based on the RF near-field coherent sensing principle, without requiring a tension chest belt or skin-contact electrocardiogram. We use cardiac and respiratory features to distinguish attention-engaging vigilance tasks from a relaxed, inattentive baseline state. We demonstrate high-quality vitals from the RF sensor compared to the reference electrocardiogram and respiratory tension belts, as well as similar performance for attention detection, while improving user comfort. Furthermore, we observed a higher vigilance-attention detection accuracy using respiratory features rather than heartbeat features. A high influence of the user’s baseline emotional and arousal levels on the learning model was noted; thus, individual models with personalized prediction were designed for the 20 participants, leading to an average accuracy of 83.2% over unseen test data with a high sensitivity and specificity of 85.0% and 79.8%, respectively


Introduction
Ambient intelligence and intelligent machine responses [1,2] have become increasingly important in recent years, and both require an estimate of human cognitive reactions. Attention detection is a subset of cognition assessment that can enable accident prevention by warning when the user starts slipping into an inattentive state. This is important for activities of daily living, including driving, as well as certain occupations, such as the military [3,4], medicine, aviation, etc. With increasing numbers of work-from-home jobs, such systems are even more important for individuals to monitor self-work fatigue and take recuperative measures.
When people perceive a vast amount of information, a subset processing is prioritized and extraneous irrelevant information filtered out, which is termed as attention [5]. It is a basic function that simultaneously controls focus, vigilance, and response [6]. Two broad attention types are endogenous and exogenous. The former is a top-down, goal-driven voluntary process with the conscious expectation of events, while exogenous attention is a bottom-up, sensory-driven, involuntary response. The attention time course can last for a short duration (a few milliseconds) to longer periods (a few seconds or minutes), termed as sustained attention or vigilance [7]. Some long-term tasks, commonly associated with workplaces, require vigilance, and may result in mental and physical fatigue.
The literature is abundant in the study of attention-related concepts, including alertness, fatigue, and engagement. Engagement is closely related to attentional involvement with a task, mainly detected using facial expressions [8,9]. Fatigue induced drowsiness impacts attention by decreasing the ability to suppress irrelevant information, leading to increased reaction times [10,11]. With more than 300,000 drowsy-driving crashes each year [12], numerous research works have made significant efforts towards driver fatigue and sleepiness detection [13], primarily using change in the blink rate, percentage of eye 1.
A touchless RF sensor that measures both cardiac and respiratory waveforms, with on-par attention detection performance as the reference chest tension belts and ECG together. The improved comfort and convenience can reduce the systematic bias and improve the applicability; 2.
Both cardiac and respiratory variability features were employed to derive the attention status every 10 s by a learning model, which were more accurate than the individual cardiac and respiratory features; 3.
The critical role of personal baseline training was examined.
Section 2 presents the RF sensor and experimental setup. The algorithm for feature extraction is discussed in Section 3. Results are presented in Section 4, followed by the discussion and conclusions.

Sensor Setup
The hardware setup included two over-the-clothes RF NCS sensors placed at the thorax and abdomen levels on the midline, as shown in Figure 1a,b. The wired RF sensors were held in place by belts, with no tension requirement. The newer lightweight Bluetoothenabled design allows for a more comfortable alternate placement [38]. The heartbeat signal was generally stronger in the thorax sensor, as it was placed closer to the heart. The abdomen sensor had a stronger lung and diaphragm motion. Figure 1d shows the typical heartbeat and respiration waveforms extracted from the NCS sensors. The sensor prototypes are implemented using a software-defined radio (SDR) transceiver by Ettus Research [39], operating at 1.82 GHz and 1.9 GHz with <−10 dBm power. A detailed description was presented in our previous work [32]. The reference sensor setup included a three-electrode ECG, and thorax and abdomen chest belts by BIOPAC [40]. Notice that ECG electrodes required conductive gel pads with bare skin touching, and that the chest belts needed reasonable tension to capture the full respiratory motion. classification. A questionnaire at the end of the study revealed varying baseline arousal and emotional states including calmness, drowsiness, and anxiety. The major contributions of this work include the following: 1. A touchless RF sensor that measures both cardiac and respiratory waveforms, with on-par attention detection performance as the reference chest tension belts and ECG together. The improved comfort and convenience can reduce the systematic bias and improve the applicability; 2. Both cardiac and respiratory variability features were employed to derive the attention status every 10 s by a learning model, which were more accurate than the individual cardiac and respiratory features; 3. The critical role of personal baseline training was examined. Section 2 presents the RF sensor and experimental setup. The algorithm for feature extraction is discussed in Section 3. Results are presented in Section 4, followed by the discussion and conclusions.

Sensor Setup
The hardware setup included two over-the-clothes RF NCS sensors placed at the thorax and abdomen levels on the midline, as shown in Figure 1a,b. The wired RF sensors were held in place by belts, with no tension requirement. The newer lightweight Bluetooth-enabled design allows for a more comfortable alternate placement [38]. The heartbeat signal was generally stronger in the thorax sensor, as it was placed closer to the heart. The abdomen sensor had a stronger lung and diaphragm motion. Figure 1d shows the typical heartbeat and respiration waveforms extracted from the NCS sensors. The sensor prototypes are implemented using a software-defined radio (SDR) transceiver by Ettus Research [39], operating at 1.82GHz and 1.9GHz with < −10 dBm power. A detailed description was presented in our previous work [32]. The reference sensor setup included a three-electrode ECG, and thorax and abdomen chest belts by BIOPAC [40]. Notice that ECG electrodes required conductive gel pads with bare skin touching, and that the chest belts needed reasonable tension to capture the full respiratory motion.

Protocol
The protocol included two routines in seated posture, namely relaxed inattentiveness (R) and vigilance-attention (A). In the former, participants were asked to relax with eyes closed for 5 min, maintaining a state of inattentiveness. The next routine involved a stimulus-driven attention task, demanding sustained vigilance during a modified Mackworth clock game [37]. A rotating clock hand was shown on the computer screen and participants were expected to respond to larger clock hand jumps by pressing the spacebar. A maximum reaction time (RT) of 1 s was allowed. The entire 6.5 min routine included some instructions and a trial run of 1 min, followed by same vigilance task for 5 min, and finally 30 s looking at the screen for potential future instructions. As participants were expected to be attentive during the entire 6.5 min routine, the entire duration is considered as an attention routine. The routine was designed using PsyToolkit software [41] which showed a clock hand rotating by a fixed step of 3.6 • /s. The probability of a longer rotation of 5.4 • at each step was set as 0.1. An indicator at the center of the clock gave instantaneous feedback of incorrect, missed, and correct responses by red, red, and green lights, respectively. Figure 2 shows different possible clock states during the task.

Protocol
The protocol included two routines in seated posture, namely relaxed inattentiveness (R) and vigilance-attention (A). In the former, participants were asked to relax with eyes closed for 5 min, maintaining a state of inattentiveness. The next routine involved a stimulus-driven attention task, demanding sustained vigilance during a modified Mackworth clock game [37]. A rotating clock hand was shown on the computer screen and participants were expected to respond to larger clock hand jumps by pressing the spacebar. A maximum reaction time (RT) of 1 s was allowed. The entire 6.5 min routine included some instructions and a trial run of 1 min, followed by same vigilance task for 5 min, and finally 30 s looking at the screen for potential future instructions. As participants were expected to be attentive during the entire 6.5 min routine, the entire duration is considered as an attention routine. The routine was designed using PsyToolkit software [41] which showed a clock hand rotating by a fixed step of 3.6°/s. The probability of a longer rotation of 5.4° at each step was set as 0.1. An indicator at the center of the clock gave instantaneous feedback of incorrect, missed, and correct responses by red, red, and green lights, respectively. Figure 2 shows different possible clock states during the task. (c) incorrect user response due to missed abnormal rotation or spacebar press for a normal clock rotation.
The experimental study was approved by the Cornell Institutional Review Board (IRB) and conducted with the written consent of the participants. Data collection was performed on 20 healthy participants including 12 female and 8 male subjects. The age range was 18-34 years, with BMI in the range 18-27 kg/m 2 . An end-of-study questionnaire noted participants' feelings of stress, relaxation, calmness, anxiety, and alertness during both routines.

Sensor Data Preparation
Our setup collected timestamp synchronized NCS and BIOPAC data, along with the information of each clock hand step and keypress RT. The NCS respiration and heartbeat waveforms were modulated on the baseband RF amplitude and phase waveforms, and were extracted by filtering [42]. For respiration, low-frequency baseline variation was removed with an order-5 Butterworth filter and 3 dB cutoff frequency ( ) of 0.05 Hz. A low-pass FIR Kaiser-window filter was used to suppress high-frequency heartbeat waveforms over 0.8 Hz. The resulting waveform was further processed by subtracting the mean of the first 60 s of data, followed by normalization using RMS of the same duration. Similarly, the heartbeat waveform was extracted by a third-order high-pass filter with 0.7 Hz and a similar low-pass filter with 1.9 Hz cut-off. These filters allow measurement of respiration rate (RR) and heart rate (HR) in the ranges of 6-40 and 45-115 breaths or beats The experimental study was approved by the Cornell Institutional Review Board (IRB) and conducted with the written consent of the participants. Data collection was performed on 20 healthy participants including 12 female and 8 male subjects. The age range was 18-34 years, with BMI in the range 18-27 kg/m 2 . An end-of-study questionnaire noted participants' feelings of stress, relaxation, calmness, anxiety, and alertness during both routines.

Sensor Data Preparation
Our setup collected timestamp synchronized NCS and BIOPAC data, along with the information of each clock hand step and keypress RT. The NCS respiration and heartbeat waveforms were modulated on the baseband RF amplitude and phase waveforms, and were extracted by filtering [42]. For respiration, low-frequency baseline variation was removed with an order-5 Butterworth filter and 3 dB cutoff frequency ( f 3dB ) of 0.05 Hz. A low-pass FIR Kaiser-window filter was used to suppress high-frequency heartbeat waveforms over 0.8 Hz. The resulting waveform was further processed by subtracting the mean of the first 60 s of data, followed by normalization using RMS of the same duration. Similarly, the heartbeat waveform was extracted by a third-order high-pass filter with 0.7 Hz f 3dB and a similar low-pass filter with 1.9 Hz cut-off. These filters allow measurement of respiration rate (RR) and heart rate (HR) in the ranges of 6-40 and 45-115 breaths or beats per minute (BPM), respectively, well over the normal resting range. All vitals were down-sampled to a uniform sampling rate, f s = 100 Hz before feature extraction.

Dual-Point NCS Measurement
As discussed earlier, we had a two-sensor placement that measured heartbeat primarily from the thorax and respiration from both thorax and abdomen sensors. Furthermore, vital- sign modulations were observed in both amplitude and phase of the baseband RF signal. Thus, the NCS signal entropy, was high with four signal sources (thorax amplitude, thorax phase, abdomen amplitude, and abdomen phase) for respiration and heartbeat waveform extraction. This redundancy was particularly useful when external motion artifacts are present, which would not affect all four channels in a similar manner. A signal-to-noise ratio (SNR) estimate was defined to identify the signal with the best quality. Signal and noise powers were estimated from the periodogram of power spectral density after baseline removal ( f 3dB = 0.08 Hz). SNR was derived as follows: where P f 1 indicates the power in the desired signal frequency band f 1 , and P f 2 and P f 3 are the noise bands after filtering.

Heart Inter-Beat Interval Detection
The HR is not stationary over time, and its variability contains valuable information about the SNS and PNS response [43]. Figure 3a shows the frequency domain characteristics of the NCS thorax waveform between [0.5, 2.5] Hz, visibly showing variable HR in the range of [55.8, 62.4] BPM. To accurately extract the inter-beat interval (IBI) from the smooth NCS signal, a weaker, but sharper second harmonic heartbeat component of the heartbeat was used [32] and IBI was measured as the time for two cycles, as denoted in Figure 3b. This process resulted in very accurate instantaneous HR estimation compared to the reference ECG as shown in Figure 3c. Note that ECG measures the electrical stimulation while NCS measures the actual heartbeat motion. The peak points in the waveform were extracted by a robust algorithm using the intersection of the moving average curve (MAC) [32].
per minute (BPM), respectively, well over the normal resting range. All vitals were downsampled to a uniform sampling rate, = 100 Hz before feature extraction.

Dual-Point NCS Measurement
As discussed earlier, we had a two-sensor placement that measured heartbeat primarily from the thorax and respiration from both thorax and abdomen sensors. Furthermore, vital-sign modulations were observed in both amplitude and phase of the baseband RF signal. Thus, the NCS signal entropy, was high with four signal sources (thorax amplitude, thorax phase, abdomen amplitude, and abdomen phase) for respiration and heartbeat waveform extraction. This redundancy was particularly useful when external motion artifacts are present, which would not affect all four channels in a similar manner. A signal-to-noise ratio (SNR) estimate was defined to identify the signal with the best quality. Signal and noise powers were estimated from the periodogram of power spectral density after baseline removal ( = 0.08 Hz). SNR was derived as follows: where indicates the power in the desired signal frequency band , and and are the noise bands after filtering. For heartbeat, is [0.8, 2] Hz, is [0.05, 0.8) Hz, and is (2,50] Hz, where /2 = 50 Hz is half of the sampling frequency. For respiration, is [0.1, 0.7] Hz, is [0.05, 0.1) Hz, and is (0.7, 50] Hz. Here, we have not differentiated between intensity of thorax and abdomen breathing, and the waveform with the highest SNR was selected for feature extraction.

Heart Inter-Beat Interval Detection
The HR is not stationary over time, and its variability contains valuable information about the SNS and PNS response [43]. Figure 3a shows the frequency domain characteristics of the NCS thorax waveform between [0.5, 2.5] Hz, visibly showing variable HR in the range of [55.8, 62.4] BPM. To accurately extract the inter-beat interval (IBI) from the smooth NCS signal, a weaker, but sharper second harmonic heartbeat component of the heartbeat was used [32] and IBI was measured as the time for two cycles, as denoted in Figure 3b. This process resulted in very accurate instantaneous HR estimation compared to the reference ECG as shown in Figure 3c. Note that ECG measures the electrical stimulation while NCS measures the actual heartbeat motion. The peak points in the waveform were extracted by a robust algorithm using the intersection of the moving average curve (MAC) [32].

Respiration Waveform Extrema Detection
To investigate the correlation between heartbeat and respiration, we have designed statistical features representing respiration waveform variability (RWV) utilizing inspiration, expiration, and respiratory effort information. The peaks in respiration waveform were extracted by the same MAC algorithm. For respiration, maxima peaks represent the end of inspiration, termed as inspire-end points, t e . The beginning of inspiration is represented by the inspire-begin point, t b , which is more difficult to accurately capture using a minima-detection algorithm. We attribute this to the following reasons: (1) a longer exhalation breath pause leading to relatively flat waveform; (2) filter artifacts can change the true minimum, especially around pauses; (3) unfiltered small heartbeat or pulse motion can result in multiple local minima. Thus, a post-processing approach was employed to identify true t b , as follows: 1.
Find zero-crossing (ZC) points of the first derivative (ZC 1 ) and second derivative (ZC 2 ) of the respiration waveform r(t) between consecutive inspire-end peaks, t e and t e−1 ; 2.
If all are minima, select the point closest to inspire-end: This results in an accurate t b corresponding to each t e to define the respiratory features, allowing independent study of inspiration and expiration variability which has not been explored in detail in earlier works [44]. Figure 4 shows the respiration waveform annotated with the detected t b and t e along with corresponding inter-respiration interval (IRI), inspiratory interval (II), expiratory interval (EI), and inspiratory volume (IV) estimations for individual breath cycles.

Respiration Waveform Extrema Detection
To investigate the correlation between heartbeat and respiration, we have designed statistical features representing respiration waveform variability (RWV) utilizing inspiration, expiration, and respiratory effort information. The peaks in respiration waveform were extracted by the same MAC algorithm. For respiration, maxima peaks represent the end of inspiration, termed as inspire-end points, . The beginning of inspiration is represented by the inspire-begin point, , which is more difficult to accurately capture using a minima-detection algorithm. We attribute this to the following reasons: (1) a longer exhalation breath pause leading to relatively flat waveform; (2) filter artifacts can change the true minimum, especially around pauses; (3) unfiltered small heartbeat or pulse motion can result in multiple local minima. Thus, a post-processing approach was employed to identify true , as follows: 1. Find zero-crossing (ZC) points of the first derivative (ZC ) and second derivative (ZC ) of the respiration waveform ) between consecutive inspire-end peaks, and ; 2. Select only positive slope points of ZC (ZC ), with the first-derivative close to 0 (ZC ); 3. Identify all such points ∈{ZC , ZC } as possible minima if ) ) 0, and a. If all are minima, select the point closest to inspire-end: that gives min ).
b. Otherwise, select the minimum point that gives min | )|.
This results in an accurate corresponding to each to define the respiratory features, allowing independent study of inspiration and expiration variability which has not been explored in detail in earlier works [44]. Figure 4 shows the respiration waveform annotated with the detected and along with corresponding inter-respiration interval (IRI), inspiratory interval (II), expiratory interval (EI), and inspiratory volume (IV) estimations for individual breath cycles. ). Accurate inspire-begin point is estimated for a difficult case with a pause after exhalation at 162 .

Heartbeat and Respiratory Features
The detected heartbeat IBI and respiration IRI, II, EI, and IV parameters were used to calculate features over each windowed segment. Both R and A routines were divided into 90 s windows ( ), or epochs, with a 10 s sliding interval ( ) over which ultra-short . Accurate inspire-begin point is estimated for a difficult case with a pause after exhalation at t = 162 s.

Heartbeat and Respiratory Features
The detected heartbeat IBI and respiration IRI, II, EI, and IV parameters were used to calculate features over each windowed segment. Both R and A routines were divided into 90 s windows (t win ), or epochs, with a 10 s sliding interval (t slide ) over which ultrashort HRV [45] and RWV features were estimated. For HRV analysis, standard time and frequency-domain metrics were derived from NCS and ECG, as follows: 1.
The mean(HR), mean(IBI), and std(IBI) are the mean and standard deviations of HR and IBI, after rejecting poor IBI values; 2.
The pIBI50 is the ratio of successive IBI counts that differ by more than 50 ms to the total IBI count, closely related to PNS activity; 3.
The LF, HF, and LF/HF are the power in low-frequency (LF:0.04-0.15 Hz), and high-frequency (HF: 0.15-0.4 Hz) indicating a balance between the SNS and PNS activity [22]. The RWV features included mean and standard deviation of IRI, II, EI, and IV, their first and second successive differences (SD 1 , SD 2 ) and the ratio of EI/II that is known to be related to the stress level [46]. The RR was also estimated as a function of the mean(IRI) over 15 s. All 36 RWV and 7 heartbeat features are listed in Table 1. The RWV features were estimated from the highest SNR NCS respiration and reference chest belt waveforms. The nonlinear entropy-based features were found to be unreliable with high dependence on the sample size [47], and were not included. Table 1. HRV and RWV classification features.
In Figure 5a, we present the correlation plot between NCS and ECG IBI data, achieving a high Pearson's correlation coefficient r = 0.961. The Bland-Altman plot in Figure 5b presents high agreement between the two measurements. The X axis is the average of the two data, and the Y axis is the difference. The middle-dotted line at −0.003 s shows a low mean (m) bias. The other lines show limits of agreement (LoA), within which 95% of the differences are expected, calculated as m ± 1.96 · σ. Similarly, Figure 5c,d shows the scatter and Bland-Altman plots between NCS and the reference IRI. High correlation between NCS and the reference heartbeat and respiratory features confirms the accuracy and robustness of our system. Low m and narrow LoA indicate small, uncorrelated errors between NCS and reference estimates.
1. The mean(HR), mean(IBI), and std(IBI) are the mean and standard deviations of HR and IBI, after rejecting poor IBI values; 2. The pIBI50 is the ratio of successive IBI counts that differ by more than 50 ms to the total IBI count, closely related to PNS activity; 3. The LF, HF, and LF/HF are the power in low-frequency (LF:0.04-0.15 Hz), and highfrequency (HF: 0.15-0.4 Hz) indicating a balance between the SNS and PNS activity [22].
The RWV features included mean and standard deviation of IRI, II, EI, and IV, their first and second successive differences (SD1, SD2) and the ratio of EI/II that is known to be related to the stress level [46]. The RR was also estimated as a function of the mean(IRI) over 15 s. All 36 RWV and 7 heartbeat features are listed in Table 1. The RWV features were estimated from the highest SNR NCS respiration and reference chest belt waveforms. The nonlinear entropy-based features were found to be unreliable with high dependence on the sample size [47], and were not included.
In Figure 5a, we present the correlation plot between NCS and ECG IBI data, achieving a high Pearson's correlation coefficient r = 0.961. The Bland-Altman plot in Figure 5b presents high agreement between the two measurements. The X axis is the average of the two data, and the Y axis is the difference. The middle-dotted line at −0.003 s shows a low mean (m) bias. The other lines show limits of agreement (LoA), within which 95% of the differences are expected, calculated as 1.96 ⋅ . Similarly, Figure 5c,d shows the scatter and Bland-Altman plots between NCS and the reference IRI. High correlation between NCS and the reference heartbeat and respiratory features confirms the accuracy and robustness of our system. Low m and narrow LoA indicate small, uncorrelated errors between NCS and reference estimates.

Results
In this section, we present accuracy statistics of NCS-based inattention vs attention detection. The current literature has mostly focused on HRV-based emotion and fatigue

Results
In this section, we present accuracy statistics of NCS-based inattention vs attention detection. The current literature has mostly focused on HRV-based emotion and fatigue detection, due to high sensor reliability and higher comfort than the use of tension chest belts for the study duration. Here, this gap is closed with an additional performance comparison of respiratory and cardiac features. To further characterize users' attentive state, correlation trends between probability of correct response (PoCR) and RT are presented for the short study duration. Furthermore, MATLAB toolboxes have been used for the following analysis. Figure 6 presents the system architecture flowchart, including (a) signal processing and feature extraction, and (b) the machine learning algorithm for attention detection. belts for the study duration. Here, this gap is closed with an additional performance comparison of respiratory and cardiac features. To further characterize users' attentive state, correlation trends between probability of correct response (PoCR) and RT are presented for the short study duration. Furthermore, MATLAB toolboxes have been used for the following analysis. Figure 6 presents the system architecture flowchart, including (a) signal processing and feature extraction, and (b) the machine learning algorithm for attention detection.

Attention and Relaxed-Inattention Classification
For epoch-based analysis, the initial 10 s of R-routine data was rejected to allow participant to settle, reduce motion artifacts in data, and account for any potential delay in achieving the inattentive-relaxed state. For simplifying the current analysis, participants were suggested to stay stationary as much as possible, and early truncation was performed for two participants to reject poor motion artefact data. Thus, 290 s and 390 s were extracted from R and A routines from all participants, except for participants 3 and 12, with 370 s and 220 s A-routine data, respectively.
While we have a limited dataset of a small epoch size, this higher time resolution is advantageous, as a user's sudden inattentiveness may be detrimental to the task. For attention vs relaxed-inattention classification, we tested two approaches, as follows: (1) leave one subject out, and (2) a personalized prediction model. A fixed algorithm was selected by 5-fold cross-validation (CV) for consistent comparison across both approaches. The kNN classifier achieved the best accuracy for binary attention vs relaxed inattention classification for each 90 s epoch, compared to SVM, QDA, and boosted and bagged tree algorithms, as shown in Table 2. The NCS and reference achieved similar accuracies of 98.2% and 98.5%, respectively, using all the described features in Section 3.3.
Using the kNN model, the leave one subject out test resulted in an accuracy drop to 59.8% and 60.5% for NCS and the reference, respectively. This suggested a high personal baseline influence on the model, which has been consistent with other works in area [48]. In the second approach, personalized prediction models for each user were designed using a small subsection of data for training and remaining out-of-time data for testing. The beginning 180 s of data from both routines were used for training, with no time overlap between the training and test epochs. A smaller A-routine training duration was selected for Participant 12 to have one test epoch. A 50% holdout of training data was used for validation and tuning. Good test accuracies of 83.2% and 80.0% were achieved by NCS and the reference, respectively. Figure 7 shows the test accuracy distribution across all the

Attention and Relaxed-Inattention Classification
For epoch-based analysis, the initial 10 s of R-routine data was rejected to allow participant to settle, reduce motion artifacts in data, and account for any potential delay in achieving the inattentive-relaxed state. For simplifying the current analysis, participants were suggested to stay stationary as much as possible, and early truncation was performed for two participants to reject poor motion artefact data. Thus, 290 s and 390 s were extracted from R and A routines from all participants, except for participants 3 and 12, with 370 s and 220 s A-routine data, respectively.
While we have a limited dataset of a small epoch size, this higher time resolution is advantageous, as a user's sudden inattentiveness may be detrimental to the task. For attention vs relaxed-inattention classification, we tested two approaches, as follows: (1) leave one subject out, and (2) a personalized prediction model. A fixed algorithm was selected by 5-fold cross-validation (CV) for consistent comparison across both approaches. The kNN classifier achieved the best accuracy for binary attention vs relaxed inattention classification for each 90 s epoch, compared to SVM, QDA, and boosted and bagged tree algorithms, as shown in Table 2. The NCS and reference achieved similar accuracies of 98.2% and 98.5%, respectively, using all the described features in Section 3.3. Using the kNN model, the leave one subject out test resulted in an accuracy drop to 59.8% and 60.5% for NCS and the reference, respectively. This suggested a high personal baseline influence on the model, which has been consistent with other works in area [48]. In the second approach, personalized prediction models for each user were designed using a small subsection of data for training and remaining out-of-time data for testing. The beginning 180 s of data from both routines were used for training, with no time overlap between the training and test epochs. A smaller A-routine training duration was selected for Participant 12 to have one test epoch. A 50% holdout of training data was used for validation and tuning. Good test accuracies of 83.2% and 80.0% were achieved by NCS and the reference, respectively. Figure 7 shows the test accuracy distribution across all the participants. Detailed results for individual participants are in Table 3, showing 85.0% average sensitivity and 79.8% average specificity for classification by NCS.  We have attributed the test performance drop for some participants to variation in A and R levels over time. This is also consistent with the participant reports of varying attention levels over time. For example, Participant 7 reported, "I maintained the same level of relaxation throughout the relaxation phase, but at the attention phase, I was more attentive at first, but slowly got less so towards the end." Some subjects felt increased drowsiness, according to Participant    We have attributed the test performance drop for some participants to variation in A and R levels over time. This is also consistent with the participant reports of varying attention levels over time. For example, Participant 7 reported, "I maintained the same level of relaxation throughout the relaxation phase, but at the attention phase, I was more attentive at first, but slowly got less so towards the end." Some subjects felt increased drowsiness, according to Participant 14, "I feel [felt] relaxed throughout the relaxation test phase, and slightly sleepy towards the end. During the attention test phase, I felt alert and slightly stressed as I got a couple wrong. Towards the end of the attention phase, I felt a little tired/hypnotized from looking at the small movements of the clock hand."

Cardiac and Respiratory Feature Comparison
To understand the individual contribution of respiration and heartbeat features, classifier performance was tested with only one feature set at a time. This resulted in a higher average 5-fold CV accuracy of 97.7% from the RW model compared to 89.8% from the HRV model by NCS. A similar trend was observed with the reference ECG and chest belt sensors showing 96.9% and 86.7% accuracies using RWV and HRV features, respectively. Figure 8 shows confusion matrices for NCS with all features and only HRV features in Figure 8a,c, compared to the reference BIOPAC in Figure 8b,d. The NCS RWV-only model (97.7%) performed very close to the model using both RWV and HRV feature sets (98.2%), which suggests high attention-specific information in RWV features and overlap between RWV and HRV feature information. A potential reason for this is RSA coupling between respiration and heartbeat. The signal quality between RWV and HRV may play a role as well, as well as the small time duration for frequency-domain features.

Cardiac and Respiratory Feature Comparison
To understand the individual contribution of respiration and heartbeat features, classifier performance was tested with only one feature set at a time. This resulted in a higher average 5-fold CV accuracy of 97.7% from the RW model compared to 89.8% from the HRV model by NCS. A similar trend was observed with the reference ECG and chest belt sensors showing 96.9% and 86.7% accuracies using RWV and HRV features, respectively. Figure 8 shows confusion matrices for NCS with all features and only HRV features in Figure 8a,c, compared to the reference BIOPAC in Figure 8b,d. The NCS RWV-only model (97.7%) performed very close to the model using both RWV and HRV feature sets (98.2%), which suggests high attention-specific information in RWV features and overlap between RWV and HRV feature information. A potential reason for this is RSA coupling between respiration and heartbeat. The signal quality between RWV and HRV may play a role as well, as well as the small time duration for frequency-domain features.

Participant Response Characterization
As an extension of attention versus relaxed-inattention detection, we also investigated the response characteristics of participants during the attention routine to search for the correlations or trends among PoCR, RT, and HR. As the respiratory features have

Participant Response Characterization
As an extension of attention versus relaxed-inattention detection, we also investigated the response characteristics of participants during the attention routine to search for the correlations or trends among PoCR, RT, and HR. As the respiratory features have longer periods than the cardiac ones, we have selected HR for the study of short-term variation around each event.
The first metric compares variation in average HR over both routines vs the RT of each participant. A ratio of average HR during the A and R routine (mean(HR A )/mean(HR R )) is plotted as a function of average RT (mean(RT)) for each participant in Figure 9a. Within the limited number of participants, we observed an increasing trend in the ratio mean(HR A )/ mean(HR R ) > 1 as mean(RT) approached 500 ms, and then gradually lowered to around 1 for the higher mean(RT). An increase in HR could be associated with an increased stress or surprise. Thus, quick responses with mean(RT) < 475 ms and <1 ratio are likely associated with low stress, followed by an elevated stressed response, before the ratio dampened out to~1 with high RT. Here, the mean(RT) excluded the missed response cases with a fixed maximum RT. ticipant in Figure 9a. Within the limited number of participants, we observed an increasing trend in the ratio mean HR ) mean HR ) ⁄ 1 as mean RT) approached 500 ms, and then gradually lowered to around 1 for the higher mean RT). An increase in HR could be associated with an increased stress or surprise. Thus, quick responses with mean(RT) < 475 ms and <1 ratio are likely associated with low stress, followed by an elevated stressed response, before the ratio dampened out to ~1 with high RT. Here, the mean RT) excluded the missed response cases with a fixed maximum RT. The second metric takes correctness of response into account, in addition to RT. An event is described as any correct or incorrect user response towards the clock jump. The HR estimated in pre-event and post-event window sizes of 5 s and 10 s was used to calculate HR /HR for each event. The RT intervals associated with all events were distributed in 100 ms bins, and mean(HR /HR ) metric was estimated along with PoCR in each RT bin, as shown in Figure 9b. Here, PoCR is defined as fraction of correct events out of total events. It is observed that mean(HR /HR ) [5 s] < 1 with RT ∈ [200, 400) ms, with similar trends for both window sizes. As RT increased to a range of [400, 600) ms, the PoCR oscillated around 0.9 with a slightly elevated mean(HR /HR ), which stabilized for the higher RT. In other words, when a participant was expected to be in an attention state, the following scenarios could be expected: The second metric takes correctness of response into account, in addition to RT. An event is described as any correct or incorrect user response towards the clock jump. The HR estimated in pre-event and post-event window sizes of 5 s and 10 s was used to calculate HR Post /HR Pre for each event. The RT intervals associated with all events were distributed in 100 ms bins, and mean(HR Post /HR Pre ) metric was estimated along with PoCR in each RT bin, as shown in Figure 9b. Here, PoCR is defined as fraction of correct events out of total events. It is observed that mean(HR Post /HR Pre ) [5 s] < 1 with RT ∈ [200, 400) ms, with similar trends for both window sizes. As RT increased to a range of [400, 600) ms, the PoCR oscillated around 0.9 with a slightly elevated mean(HR Post /HR Pre ), which stabilized for the higher RT. In other words, when a participant was expected to be in an attention state, the following scenarios could be expected:

1.
A very quick reaction (RT ≤ 200 ms) had a high probability to be incorrect; 2.
A moderately fast response with RT ∈ [250, 350) ms indicated a high PoCR and mean(HR Post /HR Pre )~1. This can be considered as the period when the user mastered the game with full attentiveness. However, (1) and (2) have small numbers of events (1 and 35, respectively), and the deduction can only be viewed as preliminary; 3.
Most RTs were in the range of 400-600 ms. Interestingly, RT > 400 ms was associated with a stable PoCR~0.9 and mean(HR Post /HR Pre )~1. This indicates that slower RT events were not necessarily incorrect. This is an interesting observation and could be due to RT not being a judgment criterion for participants.
The EEG and ECG have been used most frequently to evaluate physiological signals for cognitive monitoring; however, both require contact electrodes. Our work is based on the simultaneous extraction of respiratory and cardiac features with an over-the-clothes sensor and can achieve relatively high accuracy in addition to deployment convenience and continuous long-term monitoring. Unlike existing RF technologies, this sensor is not affected by ambient motion and does not require the user to be in isolation [32]. The personalized prediction model results in Section 4.1 shows our ability to detect participant's attentive or relaxed-inattentive states over a 90 s window, when trained over a short duration baseline (3 min for each state). This high-resolution detection can allow for monitoring changes in user's attentiveness over time. Moreover, our results indicate superior performance of respiratory features for vigilance-based attentive state classification. The nonintrusive, low-cost feature of this sensing technology allows exploration of respiratory signal features with ease for other purposes, unlike a chest belt, which requires sufficient tension, and a nasal flow meter, which is highly intrusive.
We also explored change in sustained attention over the test duration, as shown in Figure 10. Here, attention level is defined as a relative ratio of (C − W − M)/(C + W + M), where C, W, and M are correct, wrong, and missed events, respectively. The box plot in Figure 10 show distribution of attention level across participants as time progressed, with each scatter point showing individual participant value. We can observe a slightly lower median value and higher inter-quartile range (IQR) during the initial trial phase and after 4 min. The trend is reasonable since it took initial 1-2 min for participants to learn the game and then they got drowsy or tired as the game continued for a longer period, based on the reviews noted earlier. This definition of attention has limited scope as it does not include RT, which is shown to be related to attention and fatigue in earlier work [53].
The studies limitations are as follows. A major limitation of this study regards the ground truth of attentive and relaxed inattentive states. While we have used an established vigilance-based attention task, multiple participants reported feeling drowsy towards the end due to the monotonous nature. Attention can be interpreted differently as short or sustained attention, and at times may only involve thinking without a quick RT requirement. All these cases may induce different features and may not be generalizable by this model. Further, a relaxation inducing routine is likely dominated by the baseline participant's feeling of being stressed, happy, or any other emotion that can impact heartbeat and respiratory features differently. This is indicated by the poor leave one subject out performance in this work. In earlier work [48], the baseline variation for one user was studied over multiple weeks, which showed more variation in day-to-day versus each emotion on the same day. Hence, this is an important limiting factor in the attention research domain. Another limitation is the small study duration. This could lead the model to learn short temporal similarity instead of being generalized to minor variability in relaxation or attention levels. It is important to train with optimized features over longer duration and observe performance variation over time for same individual. Lastly, while sensor performance has been established in previous works, it needs to be validated over more age, BMI, and health condition diversity. reviews noted earlier. This definition of attention has limited scope as it does not include RT, which is shown to be related to attention and fatigue in earlier work [53]. The studies limitations are as follows. A major limitation of this study regards the ground truth of attentive and relaxed inattentive states. While we have used an established vigilance-based attention task, multiple participants reported feeling drowsy towards the end due to the monotonous nature. Attention can be interpreted differently as short or sustained attention, and at times may only involve thinking without a quick RT requirement. All these cases may induce different features and may not be generalizable by this model. Further, a relaxation inducing routine is likely dominated by the baseline participant's feeling of being stressed, happy, or any other emotion that can impact heartbeat and respiratory features differently. This is indicated by the poor leave one subject out performance in this work. In earlier work [48], the baseline variation for one user was studied over multiple weeks, which showed more variation in day-to-day versus each emotion on the same day. Hence, this is an important limiting factor in the attention research domain. Another limitation is the small study duration. This could lead the model to learn short temporal similarity instead of being generalized to minor variability in relaxation or attention levels. It is important to train with optimized features over longer duration and observe performance variation over time for same individual. Lastly, while sensor performance has been established in previous works, it needs to be validated over more age, BMI, and health condition diversity.
The following future work should be carried out. The future research efforts should bridge the gap between emotion and attention monitoring by utilizing both HRV and RWV features from the NCS sensor which can also be integrated into the furniture [54] without the participant being aware of their being monitored. This covert sensing will reduce the distraction and nervousness of the participant and, thus, decrease the systematic bias. Furthermore, our sensor can offer critical information for studying RSA, voluntary respiration manipulation, and their effect on cardiovascular [55] and skin conductance changes [56]. This area of research has also been associated with beneficial effects on mental and physical health [57]. Thus, we believe that our sensor hardware and classification algorithms have multi-fold benefits and are valuable in the context of comprehensive healthcare by offering comfortable continuous vital-sign information.

Conclusions
In this paper, we have demonstrated the use of a noninvasive wearable sensor setup for detecting the relaxed-inattentive vs attentive state of a user. This can pave the way for large-scale future studies, that can potentially mitigate risk factors for life conditions, such The following future work should be carried out. The future research efforts should bridge the gap between emotion and attention monitoring by utilizing both HRV and RWV features from the NCS sensor which can also be integrated into the furniture [54] without the participant being aware of their being monitored. This covert sensing will reduce the distraction and nervousness of the participant and, thus, decrease the systematic bias. Furthermore, our sensor can offer critical information for studying RSA, voluntary respiration manipulation, and their effect on cardiovascular [55] and skin conductance changes [56]. This area of research has also been associated with beneficial effects on mental and physical health [57]. Thus, we believe that our sensor hardware and classification algorithms have multi-fold benefits and are valuable in the context of comprehensive healthcare by offering comfortable continuous vital-sign information.

Conclusions
In this paper, we have demonstrated the use of a noninvasive wearable sensor setup for detecting the relaxed-inattentive vs attentive state of a user. This can pave the way for large-scale future studies, that can potentially mitigate risk factors for life conditions, such as driving, as well as daily cognitive monitoring of elderly patients with dementia. We showed strong reliability of the NCS sensor for cardiac and respiratory variability feature extraction compared to the standard reference ECG and chest belts. Our results indicate a major contribution from the respiratory features for attention detection. To the best of our knowledge, this is the first work using noninvasive respiratory sensing for attention and relaxation-inattention classification from accurate estimates of respiratory features, such as II and EI. Informed Consent Statement: Written informed consent was obtained from all subjects involved in the study. Data Availability Statement: Data presented in this study are available on request from the corresponding author for privacy purposes.